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Description 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates generally to multimedia com- 
munications systems, and more specifically to video 
processing techniques for use in conjunction with such 
systems. 

2. Description of the Prior Art 

Video composition is a technique which simultane- 
ously processes a plurality of video sequences to form 
a single video sequence. Each frame of the single video 
sequence is organized into a plurality of multiple win- 
dows. Each of the multiple windows includes frames 
corresponding to a specific one of the plurality of multi- 
ple video sequences. Video composition techniques 
have broad application to the field of multimedia com- 
munications, especially where multipoint communica- 
tions are involved, as in multipoint, multimedia confer- 
encing systems. 

In a multipoint multimedia conference, a "bridge" or 
"multipoint control unit" (MCU) is often used to establish 
multipoint connection and multi-party conference calls 
among a group of endpoints. Generally speaking, the 
MCU is a computer-controlled device which includes a 
multiplicity of communication ports which may be selec- 
tively interconnected in any of a plurality of configura- 
tions to provide communication among a group of end- 
point devices. Typical MCUs are equipped to process 
and route video, audio, and (in some case) data to and 
from each of the endpoint devices. 

MCUs may be categorized as having either a 
"switched presence" or a "continuous presence", based 
upon the video processing capabilities of the MCU. In a 
"switched presence" MCU, the video signal selected by 
a specially-designated endpoint device considered to 
be under the control of a "conference chairman" is 
broadcast to all endpoint devices participating in the 
conference. Alternatively, a "switched presence" MCU 
may select the particular video signal to be sent to all of 
the endpoint devices participating in the conference by 
examining the respective levels of audio signals re- 
ceived from each of the endpoint devices. However, 
note that the "switched presence" MCU includes no vid- 
eo processing capabilities. Rather, the MCU functions 
in a more limited sense, providing only video switching 
capabilities. Therefore, at a given moment, each of the 
endpoint devices participating in a given conference will 
display a video image from the specially-designated 
endpoint device used by the "conference chairman" or, 
alternatively, each of the endpoint devices will display a 
video image from the endpoint device used by a partic- 
ipant who is currently speaking. 

Since the existing MCU is only equipped to switch 



video signals, and cannot implement functions in addi- 
tion to switching, each of the endpoint devices are re- 
quired to use the same video transfer rate in order to be 
able to communicate with other endpoint devices. The 

& state-of-art MCU is described in ITU Document H.243, 
"Procedures for Establishing Communication Between 
Three or More Audiovisual Terminals Using Digital 
Channels up to 2 Mbps", March 1993, and in ITU Doc- 
ument H.231 , "Multipoint Control Units for Audiovisual 

10 systems Using Digital Channels up to 2 Mbps", March 
1993. 

In a "continuous presence" MCU, video composi- 
tion techniques are employed by the MCU. These video 
composition techniques provide for the selection, 

is processing, and combining of a plurality of video 
streams, wherein each video stream originates from a 
corresponding endpoint device. In this manner, video in- 
formation from multiple conference participants is com- 
bined into a single video stream. The combined video 

20 stream is then broadcast to all endpoint devices partic- 
ipating in the conference. Such conferences are termed 
■continuous presence" conferences because each of 
the conference participants can be simultaneously 
viewed by all other conference participants. At the 

25 present time, study groups organized by the ITU are 
working on the standardization of "continuous pres- 
ence" MCUs. 

Several techniques have been developed to pro- 
vide video composition features for "continuous pres- 

30 ence" MCUs. The most straightforward technique is 
termed the transcoding method, which involves the de- 
coding of a plurality of input video bit streams. These bit 
streams are decoded into the pixel domain, and then the 
video frames from the plurality of video bit streams are 

35 combined in the pixel domain to form an integrated video 
frame. The integrated video frames are then re-encoded 
for distribution. 

Another technique for providing video composition 
features has been devebped by Bellcore. This tech- 

4Q nique, which may be referred to as bit stream domain 
mixing, is useful only in the context of systems conform- 
ing to the ITU H.261 standard. Bit stream domain mixing 
operates on image representations, and exploits a proc- 
ess known as quadrant segmentation. The problem with 

45 this approach is that it is not compatible with existing 
terminal equipment, since it requires asymmetric oper- 
ation of the endpoint devices. Moreover, since the bit 
stream mixer in the MCU is passive, the combined bit 
stream may violate the HRD requirement specified in 

so the H.261 standard. 

One state-of-the-art approach to video composition 
uses specially-equipped video terminals. Each video 
terminal is equipped to divide the video channel into 2-4 
sub channels, while transmitting an outgoing video bit 

ss stream on only one of the channels. All of the sub chan- 
nels use the same bit rate, the same picture format, and 
the same maximum frame rate. The MCU must provide 
circuitry for de-multiplexing the sub channels it receives 
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from each terminal, circuitry for routing the sub channels 
appropriately, and circuitry for re-multiplexing the sub 
channels prior to transmission to each terminal. Each 
terminal includes a video receiver which receives up to 
4 sub channels for decoding and display. The advantage 
of this approach is that it provides minimal insertion de- 
lay, but this advantage is more than offset by the require- 
ment for elaborate modifications to existing video termi- 
nals. 

SUMMARY OF THE INVENTION 

Video composition techniques are disclosed for 
processing video information from a plurality of sources 
to provide a video image having a plurality of rectangular 
regions. Each rectangular region displays video infor- 
mation from a specific one of the plurality of video sourc- 
es. 

The video information from each video source is in 
the form of an incoming digital bit stream. The digital bit 
stream from a first video source has a first bit rate, and 
the digital bit stream from a second video source has a 
second bit rate where the first bit rate may or may not 
be equal to the second bit rate. The incoming digital bit 
streams are fed to a rate matching circuit which converts 
all incoming digital bit streams to a common bit rate. The 
output of the rate matching circuit is fed to a synchroni- 
zation and multiplexer circuit which places video infor- 
mation from specific digital bit streams into correspond- 
ing rectangular regions of a composite video image. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a video composition 
apparatus according to a preferred embodiment 
disclosed herein; 

FIGs. 2 and 3 are pictorial diagrams representing 
the coding format of a signal conforming to the ITU 
H.261 standard; 

FIG. 4 is a flowchart setting forth a video composi- 
tion method according to a preferred embodiment 
disclosed herein; 

FIG. 5 is a hardware block diagram of a synchroni- 
zation and multiplexing unit; 
FIG. 6 is a flowchart setting forth the synchroniza- 
tion and multiplexing procedure used by the system 
of FIG. 5; 

FIG. 7 is a hardware block diagram setting forth a 
video composition system constructed in accord- 
ance with a preferred embodiment disclosed 
herein; 

FIG. 8 is a block diagram showing an illustrative 
hardware configuration for the video processor of 
FIG. 7; 

FIG. 9 is a hardware block diagram setting forth an 
illustrative structure for the discrete cosine transfor- 
mation (DCT) processor of FIG. 1; and 
FIG. 10 is a data structure diagram setting forth an 



illustrative example of DCT coefficient partitioning. 

DETAILED DESCRIPTION 

s The video composition techniques of the present in- 
vention will be described in the context of an operational 
environment substantially conforming to the ITU H.261 
standard. That is, the inputs and output of a video com- 
position system consist of coded video bit streams 
which are compressed using a coding format described 
in the ITU-T document "Recommendation H.261 , Video 
Codec for Audiovisual Services at px64 kbits/s", May 
1992 and "Description of Reference Model 8", June 9, 
1989. The present invention is described in the context 
of the H.261 standard for illustrative purposes only, it 
being understood that the techniques disclosed herein 
are useful in the context of operational environments not 
conforming to the H.261 standard. 

FIG. 1 shows a hardware block diagram of a coded 
domain video composition system. The inputs to the 
system are first, second, third and fourth coded video 
bit streams 101, 102, 103, 104, respectively, having re- 
spective transmission rates of R1 , R2, R3, R4 kbits/sec. 
The output signal 150 of the system is the coded video 
bit stream which may have the same transmission rate 
as any one of the inputs. The output rate may be denot- 
ed as R kbits/sec. The inputs represent video informa- 
tion, and are coded in a format known as QCIF which is 
described in the above-referenced H.261 standard. The 
output is coded in a format known as CI F, also described 
in the H.261 standard. The output video bit stream is a 
composite video signal representing the composition of 
the four input video sequences. The coded video bit 
streams are the binary representations of video signals 
which are compressed by a coding algorithm described 
in the H.261 standard and then coded according to an 
H.261 syntax. 

FIGs. 2 and 3 are data structure diagrams setting 
forth illustrative coding formats for representing video 
information in accordance with the H.261 standard. Re- 
ferring now to FIG. 2, video information consists of a plu- 
rality of frames 201, 203, 205, 207, 209, 211, 213, 215, 
217, 219, 221. Each of these frames contains a a rep- 
resentation of a two-dimensional video image in the 
form of a pixel array. Since a given frame may represent 
a video image at a specific moment in time, a plurality 
of frames may be employed to represent a moving im- 
age. Together, the frames comprise a moving video im- 
age. 

Each of the frames is compressed according to any 
one of two types of compression algorithms, termed in- 
tra-! rame coding (I) and predictive coding (P). For ex- 
ample, frames 201 and 211 are compressed using intra- 
frame coding (I), and frames 203, 205, 207, 209, 213, 
21 5, 21 7, 21 9, and 221 are compressed using predictive 
coding. The sequence of frames shown in FIG. 2 estab- 
lish a data structure for representing a video image in 
the form of an encoded video sequence having a plural- 
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ity of levels arranged in a two-dimensional array, where- 
in each level represents the value of a pixel element. 
This encoded video sequence may be termed a coded 
video bit stream. 

If intra-frame coding (I) is to be applied to a given 
frame, such as frame 201 , the frame is termed an (-des- 
ignated frame, and if predictive coding (P) is to be ap- 
plied to a given frame, such as frame 205, the frame is 
termed a P-designated frame. 

Pursuant to intra-frame coding (I) compression 
processes, the I -designated frame 201 is divided into a 
plurality of pixel blocks, wherein each block consists of 
an array of 8 x 8 pixels. Next, a discrete cosine transform 
(hereinafter, DCT), is performed on each of the pixels in 
the 8x8 pixel block, in accordance with procedures well- 
known to those skilled in the art, to generate a plurality 
of DCT coefficients. Thereafter, quantization is per- 
formed on the DCT coefficients, in accordance with well- 
known quantization procedures. These quantized DCT 
coefficients constitute compressed video image infor- 
mation for the I -encoded frame 201 . 

Predictive coding (P) is implemented on a P-desig- 
nated frame, such as frame 205, by: 1) partitioning the 
P- designated frame into a plurality of macro blocks. For 
example, if the frame includes a plurality of pixel arrays, 
each having 16x16 pixels (FIG. 2, 251, 252, 253, 254, 
257, 258), the block may be partitioned into 4 contiguous 
blocks, wherein each block is an 8 x 8 pixel array; a 1 6 
x 16 pixel array (luminance) together with an 8 x 8 pixel 
block (chrominance) and an 8 x 8 pixel block (chromi- 
nance), comprises a macro block 247; 2) for each of the 
macro blocks created in step (1 ), searching the most re- 
cent previously occurring frame (which could be either 
a P- or an I -designated frame, but in the present exam- 
ple is frame 203) for the macro block which contains im- 
age information that is most similar to the image infor- 
mation in the macro block created in step (1 ); 3) gener- 
ating motion vectors to spatially translate the macro 
block found in the prior I or P frame in step (2) to the 
location of the similar macro block in the P frame pres- 
ently being compressed; 4) generating a predicted 
frame from the most recent previously occurring frame 
using the motion vectors; 5) on a macro-block-by-mac- 
ro-block basis, subtracting the predicted frame from the 
P-frame being compressed, to generate blocks of resi- 
dues; 6) performing DCT's on the blocks of residues; 7) 
quantizing the coefficients of the blocks of transformed 
residues; and 8) concatenating the quantized residue 
coefficients and the motion vectors to form a com- 
pressed video signal. 

In an intra-frame coded (I) picture, every macro 
block is intra-coded. That is, each macro block is coded 
without referring to any macro block in the previous l-or 
P-frame. In the predictive-coded (P) picture, the macro 
block can be either intra-coded or inter-coded. 

To form the coded video bitstream for transmission, 
the compressed image information, as well as other in- 
formation such as motion vectors, are coded using 



specified code words. The code words are then multi- 
plexed into a layered data structure to form the final bit- 
stream. In an H.261-like environment, the coded bit- 
stream is organized into a hierarchical format, the struc- 
5 ture of which is illustrated in FIG. 3. 

Referring to FIG. 2, the sequence of frames 201, 
203, 205, 207, 209, 211 , 213, 215, 217, 219, 221 forms 
a coded video bitstream. This bitstream may be concep- 
tualized as a serial representation of coded frames 
10 which can be processed to form a moving video image 
(i.e., a moving picture). A typical sequence of frames is 
IPPP..PIPPP...., where I indicates an intra-coded frame, 
and P designates a predictive-coded frame. For each 
frame 221 , the coded bitstream representing the frame 
is includes a header 263 and coded data 265. Each head-, 
er 263 includes a start code and data related to the re- 
spective frame (i.e., picture). In an H.261 system envi- 
ronment, much of the header information is required for 
synchronization purposes. For example, at the frame 
(picture) layer for frame 221 , header 263 includes a pic- 
ture start code (PCS) field 267, a picture number (TR) 
field 269, a picture type (PTYPE) field 271, a PEI field 
273, and a PSPARE field 274. The PEI field 273 and the 
PSPARE field 274 are adapted to accommodate extra 
information which may be required for future applica- 
tions. 

Picture data is segmented into Groups of Blocks 
(GOB) 223, 225, 227, 229, 231 , 233, 235, 237, 239, 241 , 
243, and 245. A GOB (for example, GOB 229) compris- 
es one-twelfth of the coded l-frame (CIF) 221. There- 
fore, GOB 229 may be conceptualized as including one- 
third of one quarter of a coded l-frame picture area. The 
area represented by one-quarter of a coded l-frame pic- 
ture may be abbreviated as QCIF. Accordingly, there are 
12 GOBs 223, 225, 227, 229, 231, 233, 235, 237, 239, 
241 , 243, 245 in a CIF frame 221 , and three GOBs in a 
QCIF frame. The arrangements of GOBs in a CIF/QCIF 
picture are depicted in FIGs. 2 and 3. 

Each GOB 229 includes a header field 291, fol- 
lowed by a macro block data field 298. The header field 
291 includes a GOB start code (GBSC) field 292, a 
group number (GN) field 293, a group type (GTYPE) 
field 294, a GOB quantizer (GQUANT) field 295, and 
spare information fields in the form of GEI field 296 and 
GSPARE field 297. Each GOB 229 consists of 33 macro 
blocks, such as "macro block 24" (reference numeral 
247) and "macro block 25" (reference numeral 249). The 
arrangement of macro blocks within a GOB is depicted 
in FIG. 2. 

Each macro block includes a header field 275 fol- 
lowed by a block data field 277. The header field 275 
includes a macro block address (MBA) field 279, a block 
type information (MTYPE) field 281, a quantizer type 
(MQUANT) field 283, a motion vector (MVD) field 285, 
and a coded block pattern (CBP) field 287. The block 
data field 277 of each macro block 247 consists of 6 
blocks, including four luminance blocks Y1 (reference 
numeral 251), Y2 (reference numeral 252), Y3 (refer- 
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ence numeral 253), Y4 (reference numeral 254), one 
chrominance block U (reference numeral 257), and one 
chrominance block V (reference numeral 259). An illus- 
trative example of the contents of luminance block U 
(reference numeral 257) is set forth in FIG. 2. Note that s 
this block includes an 8x8 pixel array wherein all pixels 
have a luminance value of black. 

A block represents a matrix (array) of pixels, e. g., 
8x8, over which a discrete cosine transformation (DCT) 
is performed. The array of pixels is represented by a ma- 1 o 
trix of pixel array coefficients (AC). The transformed co- 
efficients (TCOEFF) 301 (FIG. 3) consists of DCT coef- 
ficients occurring first, followed by respective pixel array 
coefficients (AC), in the order of their relative impor- 
tance. The arrangement of DCT and AC coefficients in is 
an illustrative block data field 277 (FIG. 3) is shown in 
FIG. 4. The block data field 277 (FIG. 3) consists of the 
transformed coefficients (TCOEFF) 301 and an end of 
block code (EOB) 303 which is appended at the end of 
each successively occurring block of data. 20 

Referring back to FIG. 1 , the first, second, third, and 
fourth input signals 101 , 102, 103, and 104, respectively, 
each represent a coded H.261 video bit stream having 
a transmission rate of R kbits/sec. These input signals 
are each buffered by a corresponding receiving buffer 25 
105, 106, 107, 108, respectively. Respective video Mul- 
tiplex Decoders (VMDs) 109, 110, 111, and 112 read the 
bit streams from the respective buffers and process the 
video bit streams. VMDs 109,110,111, and 1 1 2 may be 
fabricated using a dedicated hardware configuration of 30 
a type known to those skilled in the art. Alternatively, 
digital signal processors (DSPs) may be used to fabri- 
cate VMDs 109, 110, 111, and 112, wherein the DSPs 
are loaded with software which implements the function- 
ality of a VMD. The selection of suitable software for use 35 
with the DSPs is a matter well-known to those skilled in 
the art. 

Irrespective of the manner in which VMDs 109, 110, 
111 and 112 are implemented, each VMD may be con- 
ceptualized as a combination of a decoder and a demul- 40 
tiplexer. When a VMD 109 receives an incoming coded 
video bit stream, it de-multiplexes the bit stream, de- 
codes any header information which has been coded 
into the bit stream, and recovers compressed video in- 
formation, i.e., video data. The output of each VMD 1 09 45 
consists of three portions: (1) quantized DCT (discrete 
cosine transformation) coefficients, (2) quantization in- 
formation, and (3) optional motion vectors. 

For VMD 109, a first output 113 provides the DCT 
coefficients and quantization parameters, and a second so 
output 114 provides the motion vectors, wherein the first 
and second outputs 113, 114 are obtained from first in- 
put signal 101. Similarly, for VMD 110, a first output 115 
provides DCT coefficients and quantization parameters, 
and a second output 116 provides motion vectors, ss 
wherein first and second outputs 115, 116 are obtained 
from second input signal 102. Likewise, VMD 111 has a 
first output 1 1 7 and a second output 1 1 8, the first output 



117 representing DCT coefficients and quantization pa- 
rameters, and the second output 118 representing mo- 
tion vectors, wherein first and second outputs are ob- 
tained from third input signal 103. VMD 112 has a first 
output 119 and a second output 1 20, first output 11 9 rep- 
resenting DCT coefficients and quantization parame- 
ters, and second output 1 20 representing motion vec- 
tors, wherein first and second outputs 119, 120 are ob- 
tained from fourth input signal 104. 

The first outputs 113, 115, 117, 119 are coupled to 
respective DCT processing units 121, 122, 123, and 
124. To reduce delay time and computational complex- 
ity, motion estimation techniques are not employed. 
Rather, the motion vectors obtained from second out- 
puts 1 1 4, 1 1 6, 1 1 8, and 1 20, respectively, are fed directly 
to first input terminals of respective Video Multiplex En- 
coders (VME) 129, 130, 131, 132. The VMEs 129, 130, 
1 31 , and 1 32 each perform the function of producing a 
new video bit stream. 

DCT processing units 121, 122, 123, and 124 are 
the units where the DCT coefficients from first, second, 
third, and fourth input signals 101, 102, 103, 104 are 
further processed. The amount of data produced at each 
DCT processing unit 1 21 , 1 22, 1 23, 1 24 is controlled by 
respective control signals 1 43, 1 44 1 45, and 1 46. These 
control signals are produced by a rate control unit 141 . 

Each DCT processing unit 121, 122, 123, 124 has 
a respective output terminal 125, 126, 127, 128. Each 
output terminal 125, 126, 127, 128 provides a signal 
which includes processed DCT coefficients, and is cou- 
pled to a second input terminal of a respective VME 1 29, 
130, 131, and 132. At each VME 129, 130, 131, 132, 
the processed DCT coefficients and motion vectors are 
encoded and multiplexed into the layered structure 
shown in FIG. 3. The encoded, multiplexed signals pro- 
duced by VMEs 129, 130, 131, and 132, in the form of 
coded bit streams, are sent to respective buffers 1 33, 
134, 135, and 136. 

The buffers 134, 135, 136, and 137 each include 
circuitry to ascertain and to indicate buffer status, which 
is defined as the occupancy ratio of the memory loca- 
tions within the respective buffer 1 34, 1 35, 1 36, and 1 37. 
The occupancy ratio refers to the ratio between the 
number of occupied memory locations within a given 
buffer and the total number of memory locations within 
this buffer. For each buffer 133, 134, 135, 136, the oc- 
cupancy levels for the various data transfer rates are 
conveyed to a respective buffer output 1 37, 1 38, 1 39, 
and 1 40 in the form of a buffer status indication signal. 
The buffer status indication signals at buffer outputs 
1 37, 1 38, 1 39, and 1 40 are applied to a rate control unit 
141 to adjust the average data rate produced by each 
DCT coefficients processor 121, 122, 123, and 124. 

Rate control unit 141 is coupled to each of the DCT 
processing units 121, 122, 123, 124. The rate control 
unit 141 receives the buffer status indication signals 
from buffer outputs 137, 138, 139, and 140, and com- 
putes the number of bits per frame for the video bit 
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streams at the respective buffers 133, 134, 135, 136. 
This computation yields the total number of bits for each 
composite CIF frame based on the output transmission 
rate R. The total number of bits for each composite CIF 
frame is further distributed among four QCIF pictures, 
which are represented by the outputs of the four DCT 
coefficient processors 121, 122, 123, and 124. 

The bits allocated to each QCIF picture is distribut- 
ed to all of the macro blocks in a given frame to deter- 
mine the targeted number of bits per macro block. 
Based upon the targeted number of bits per macro 
block, the rate control unit 141 generates corresponding 
control signals on signal lines 143, 144, 145, and 146 
for the DCT coefficient processors 121, 122, 123, 124. 
The characteristics of these control signals are selected 
so as to cause the DCT coefficient processors 1 21 , 1 22. 
123, 124 to minimize or eliminate the difference be- 
tween the actual number of bits produced for each mac- 
ro block and the targeted number of bits per macro block 
to be achieved by a specific DCT processor 121, 122, 
1 23, 1 24. Note that the targeted number of bits per mac- 
ro block may, but need not, be the same for each of the 
DCT processors 121, 122, 123, and 124. DCT proces- 
sors 121, 122, 123, and 124 receive the DCT coeffi- 
cients recovered from VMD 1 09, 1 1 0, 1 1 1 , and 1 1 2, and 
further process these coefficients in order to produce the 
proper number of coefficients designated by the respec- 
tive control signals. 

These are the methods which can be used to proc- 
ess the DCT coefficients to reduce the total number of 
bits under control of rate control unit 141 . Each of these 
methods provides for the 'graceful' degradation of video 
quality, wherein humanly-perceptible degradation is 
minimized. The first method is termed DCT coefficients 
zeroing, the second method is called the requantization 
of the DCT coefficients, and the third method consists 
of the combination of the first and second methods. In 
the first method, DCT coefficients are partitioned into 
groups based upon the relative importance of the vari- 
ous coefficents. Due to the fact that DCT coefficients are 
generally organized into two-dimensional arrays where- 
in the array entries which are relatively close to the up- 
per left-hand comer of the array include relatively low- 
frequency components, as compared with array entries 
which are relatively close to the lower right-hand corner 
of the array, the relative importance of various DCT co- 
efficients is known. The lower frequency components 
are more important and the higher frequency compo- 
nents are less important. Based upon the output pro- 
duced by rate control 141 circuit on signal lines 143, 144, 
145, and 146, the coefficients of the least important 
group of each DCT coefficient processor are set to ze- 
roes. Here, the control signal on signal line 143, 144, 
145 and 146 consist of a digital representation of the 
indices of a plurality of specific importance groups, or 
simply indices of the DCT coefficients within a macro 
block, whose coefficients will subsequently be set to ze- 
roes. By forcing some DCT coefficients to zero, the 



amount of data produced by the DCT coefficients proc- 
essor 107 can be properly controlled by rate control 141 
circuit. 

Atypical partitioning of DCT coefficients is illustrat- 

s ed in FIG. 10. The DCT coefficients are arranged in a 
two-dimensional array 1000 stored in block data field 
277 (FIG. 3). The twcKJimensional array 1000 (FIG. 5) 
includes eight rows and eight columns. Each entry in the 
array corresponds to a specific entry group, such as 

10 Group 1006, Group 1007, or Group 1008. The groups 
are based upon the relative importance of the entries 
contained therein. Each group includes entries conform- 
ing to a specific range of importance levels. These im- 
portance levels relate to the relative extent to which the 

75 elimination of a particular entry would degrade the qual- 
ity of the overall video image in a given frame. In the 
example of FIG. 10. Group 1006 is the most important 
group, and includes entries having a relatively high level 
of importance. Group 1007 includes entries having an 

20 intermediate level of importance, and Group 1008 in- 
cludes entries having the least importance to the overall 
quality of the video image. 

A second method of processing DCT coefficients is 
requantization. The output signals at the first outputs 

2S 113, 115, 117, 119 of each VMDs 109, 110, 111, 112, 
includes two components: quantized DCT coefficients, 
and a quantization parameter. In order to determine val- 
ues for the DCT coefficients, an inverse quantization op- 
eration is performed on the quantized DCT coefficients 

30 as follows. Let {X* i=0,1,2,.. 63, K=1, 2, ,3, 4} be the 
quantized DCT coefficients of each DCT coefficients 
processor K; and [y v i=0, 1....63} be the reconstructed 
DCT coefficients at each DCT coefficients processor K, 
with QJcp representing the quantization parameter. 

35 Then, with respect to an H. 261 -like environment, in the 
I -coding mode, the reconstructed DC coefficient y 0 is 
calculated using the relationship 
k k * n 

y 0 = x o 8 » 

40 and the remaining coefficients are calculated using the 
formula 

y k l = [x k f2 + sm(x k l )]*Q k p. 

where {i= 1, 2, ...63} in I mode and {i=0, 1, ...63} in P 
mode, and the sign(w) function is defined as follows: 



45 



sign( 



w) = { 



>0_ 
0 



To control the amount of data produced by each 
50 DCTcoefficients processor 121, 122, 123, 124(FIG. 1), 
the rate-control unit computes the proper quantization 
parameters &p neH based on the targeted bits per mac- 
ro block and sends these parameters to DCT coeffi- 
cients processor 121, 122, 123, 124 to requantize the 
55 DCTcoefficients. Let{z k j , i=0,1,..63, K=1, 2, 3, 4} be the 
new quantized DCTcoefficients, and &Q U&M be tnG new 
quantization parameter obtained from the rate control 
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141 circuit. Then, the new quantized DCT coefficients 
are determined by 

z k o = (y k 0 ^V8. 

where z 0 is the DC coefficient of the I -coded macro 5 
block. The rest of the coefficients are obtained by 

z k i = y k i (2-Qp k new ) 

where {i=1 , ... 63 } for the intra-coded macro block, and 
{i=0,1,...,63} for inter-coded macro blocks, K=1, 2, 3, 4 
corresponding to each DOT coefficients processor 1 21 , 

122, 123, 124. 
A third method of video bit rate matching may be 

employed in conjunction with a preferred embodiment 
disclosed herein. This third method includes all methods 
which represent combinations of various features of the 
first and second methods. The manner in which the first 
and second methods are combined is determined by the 
specific applications of a given system. One illustrative 
combination of the first and second methods is the proc- 
ess of using DCT coefficient partitioning to process in- 
tra-coded macro blocks, and then employing requanti- 
zation to process the inter-coded macro blocks. 

Although the DCT coefficients processor 121, 122, 

123, 124 are equipped for implementing the three dif- 
ferent processing schemes described above are satis- 
factory for lower rate reduction and intra-coded frames, 
there is a mismatch, "drift" between an endpoint device 
that transmits video information at a fast rate relative to 
other endpoint devices which decode this video infor- 
mation at a slower rate. This mismatch is brought about 
because the video encoder is required to operate at a 
faster bit rate than the video decoder. This mismatch ex- 
ists for all the inter-coded frames and is likely to accu- 
mulate with time, unless an intra-coded frame is period- 
ically inserted into the video bit stream. To control the 
accumulation of the mismatch, an improved DCT proc- 
essor with the mismatch correction elements is shown 
in FIG. 9. 

FIG. 9 is a hardware block diagram setting forth an 
illustrative structure for the discrete cosine transforma- 
tion (DCT) processor of FIG. 1. The hardware configu- 
ration of FIG. 9 represents an improvement over the 
DCT coefficient processor disclosed before in connec- 
tion with FIG. 1, as well as other existing state-of-the- 
art systems, such as the systems described in an ITU- 
T document entitled, "Low Bitrate Coding (LBC) for 
Videophone", document no. LBC-94-166. One imple- 
mentation described in the ITU document utilizes one 
motion-compensated prediction storage device and two 
transform operations: a forward transform operation, 
and an inverse transform operation. The main purpose 
of this implementation is to correct the "drift", i.e., the 
mismatch, between a video encoder and a video decod- 
er. 

According to a preferred embodiment disclosed 
herein, the two transform operations described in the 
preceding paragraph are no longer required. Rather, 



motion compensation is performed in the transform do- 
main, as the terms "motion compensation" and "transfer 
domain" are generally understood by those skilled in the 
art. With reference to FIG. 9, one feature of this embod- 
iment is that the drift error signal stored in a picture mem- 
ory of a prediction frame storage device 903 need not 
be stored with full accuracy. In particular, only a small 
number of the lower-frequency components of the trans- 
form coefficients need to be retained in the picture mem- 
ory. Since only a relatively small number of coefficients 
are now involved in the motion compensation process, 
and the transform operations are no longer needed, im- 
plementation of the embodiments disclosed herein is 
simplified considerably over the system described in the 
above-referenced ITU-T document identified as no. 
LBC-94-166. 

The simplified system disclosed herein is described 
below with reference to FIG. 9. An improved DCT (dis- 
crete cosine transformation) processor 121 is shown, 
which includes an inverse quantizer 901, a quantizer 
902, and a prediction frame storage device 903. The in- 
verse quantizer 901 accepts an input bit stream from the 
first output 113 of VMD 109 (FIG. 1). The output of in- 
verse quantizer 901 is coupled to a first input of a sum- 
mer 904, and this output is also coupled to a first input 
of a subtractor 906. The output of summer 904 is fed to 
a first input of quantizer 902. A second input of quantizer 
902 is connected to signal line 143 which is coupled to 
rate control 141 circuit (FIG. 1). 

The output of quantizer 902 (FIG. 9) is fed to a sec- 
ond input of subtractor 906. The output of subtractor 906 
is connected to a first input of summer 905. The output 
of summer 905 is coupled to a first input of prediction 
frame memory storage device 903, and a second input 
of prediction frame memory storage device 903 is con- 
nected to the second output of VMD 109 (FIG. 1). The 
output of prediction frame storage device 903 is fed to 
a second input of summer 904 and this output is also 
fed to a second input of summer 905. 

Inverse quantizer 901, quantizer 902, summers 
904, 905, and subtractor 906 are system components 
which are well-known to those skilled in the art. Conven- 
tional components may be used for these items. With 
respect to the prediction frame storage device 903, this 
device includes a video buffer for storing information 
corresponding to one or more video frames, a random- 
access memory device, and a microprocessor for con- 
trolling the operation of the buffer and the random-ac- 
cess memory. The microprocessor is equipped to exe- 
cute a software program adapted to perform the steps 
outlined below in connection with the prediction frame 
storage device 903. 

The hardware configuration of FIG. 9 operates as 
follows. Assume that an input video bit stream having a 
bit rate of R1 passes from the first output 105 of VMD 
109 (FIG. 1) to the input of inverse quantizer 901 (FIG. 
9). One purpose of DCT coefficients processor 121 
(FIGs. 1 and 9) is to generate an output signal repre- 
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senting transform coefficients. When the DCT coeffi- 
cients processor 107 is initially started up, there is no 
drift error between VMD 109 and VME 129 (FIG. 1). 
Therefore, upon initial startup, inverse quantizer 901 
provides an output signal including transform coeffi- 
cients, and this signal passes unchanged through sum- 
mer 904, to quantizer 902. 

The operation of quantizer 902 is controlled by a 
signal on signal line 1 43 from the rate control 1 41 circuit 
(FIG. 1), so as to provide a signal having the desired 
output bit rate at the output of buffer 1 33 of FIG. 1 . Note 
that the output of quantizer 902 (FIG. 1 1 ) represents the 
DCT coefficients processor output 1 25. This output 1 25 
is then recoded and multiplexed with motion vectors and 
quantization information by VME 129 (FIG. 1). The VME 
129 may then send the recoded, multiplexed signal to 
transmission buffer 1 33. The signal is stored in buffer 
133 prior to synchronization and multiplexing at sync 
and mux 147 (FIG. 1). The fullness, or buffer memory 
location occupancy ratio, of transmission buffer 133 is 
used to control the quantization levels for quantizer 902. 

Next, assume that the output of inverse quantizer 
901 does not equal the output of quantizer 902. The out- 
put of inverse quantizer 901 will be denoted as "A", and 
the output of quantizer 902 will be denoted as "B". Thus, 
an error of B-A is added to the picture data. This error, 
denoted as Ed, is subtracted from the picture data by 
the system of FIG. 9. At initial startup, Ed is zero, and 
the data pass unchanged through summer 905 to the 
prediction frame storage device 903. Typically, only a 
small number of low-frequency coefficients are fed to 
subtractor 906, and thus, Ed is only an approximation 
of the actual drift error due to requantization. During re- 
coding of the next video frame, Ed is approximately 
equal to the drift error of the previous frame. During mo- 
tion-compensated prediction, prediction frame storage 
device 903 uses motion vectors on the second output 
1 14 of VMD 109 (FIG. 1 ) to output a displaced drift error 
signal, which will be seen at the DCT coefficients proc- 
essor output 125, and at transmission buffer 133 which 
receives bits at the desired output bit rate. Without cor- 
rection, this drift error will accumulate over time and 
eventually result in unacceptable system performance. 

In order to ameliorate the problem of drift error ac- 
cumulation, the previous frame motion compensated 
drift error Ed is added to the present frame signal A prior 
to requantization by quantizer 902. If quantizer 902 in- 
troduced very little error, this would completely correct 
the drift error accumulation problem. However, since 
quantizer 902 introduces a finite amount of error, the drift 
can only be partially corrected, and the output of sub- 
tractor 906 will not, in general, be zero. Thus, summer 
905 adds the drift error from the current frame to the 
approximate accumulated drift error from previous 
frames to produce an approximate accumulated drift er- 
ror Ed for the current frame. 

The prediction frame storage device 903 only has 
to compute a small number (i.e., N) of compensated co- 



efficients. Note that, for intra-blocks of video data, the 
prediction frame storage device 903 is programmed to 
set Ed to zero. The relatively small number of computa- 
tions required to implement the methods disclosed here- 

$ in is vastly reduced as contrasted with the relatively 
large number of computations required to perform ex- 
isting processes using pel domain motion compensa- 
tion. An additional advantage of the disclosed methods 
is that these methods require much less memory space 

10 than existing prior art methods. 

As mentioned previously, rate control unit 141 gen- 
erates four control signals 1 43, 1 44, 1 45, and 1 46, which 
serve the purpose of controlling the amount of data pro- 
duced by each DCT processor 121, 122, 123, 124. 

15 Therefore, if a control signal is changed, the composite 
output video bit stream may also change. If the input 
video bits R1 , R2, R3, R4 are different, the rate control 
unit could generate the different control signals to con- 
trol each DCT coefficients processor 1 21 , 1 22, 1 23, 1 24 

20 to produce the proper composite output. For illustrative 
purposes, two operational modes may be defined. In a 
first operational mode, according to each input rate R1 , 
R2, R3, R4 and the required output rate R, rate control 
unit 141 allocates the proper amount of bandwidth to 

2S each DCT coefficients processor 121, 122, 123and124. 
In a special case, where the input rates are the same, i. 
e., R1=R2=R3=R4=R, then the rate control unit 141 al- 
locates an equal amount of bandwidth to each DCT 
processor 121, 122, 123, 124. In this case, the control 

30 signals applied to signal lines 143, 144, 145, and 146 
are identical. Therefore, the total number of bits gener- 
ated by each DCT processor 1 21 , 1 22, 1 23, and 1 24 are 
identical or very close. The frame rates and the picture 
quality of each quarter (QCI F) in the final composite pic- 
as ture (CIF) are the same. 

In a second operational mode, the input video rates 
R1=R2=R3=R4=R and at least one of the DCT proces- 
sors 121 is allocated with a first amount of bandwidth, 
and at least one of the remaining DCT processors 122, 

40 123, 124 is allocated with a second amount of band- 
width, such that the first amount of bandwidth does not 
equal the second amount of bandwidth. This mode is 
particularly useful in the operational environment of a 
conference wherein some participants desire totransmit 

45 video data consisting of still images, such as drawings 
or figures, as opposed to moving images. 

As there is no need to allocate a large amount of 
bandwidth to input signals representing still video imag- 
es, rate control unit 141 allocates less bandwidth to 

50 these inputs, while at the same time allocating more 
bandwidth to those input signals which carry bit streams 
of moving video images. The operational modes may be 
selected by one of the conference participants (i.e., 
"chair control 1 ) and fed from a given endpoint device to 

55 the rate control unit 141 via a signal received on signal 
line 142 from control processor 840. 

FIG. 4 is a flowchart setting forth control procedures 
for the two operational modes discussed in the preced- 
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ing paragraph. The program commences at block 401 , 
where the rate control unit 141 (FIG. 1 ) obtains the out- 
put bit rate and the desired operational mode or modes 
from the host control unit via a signal on signal line 142. 
Also at block 401, the rate control unit 114 (FIG. 1) im- 
plements a series of operations which are determined 
by the operational mode(s) specified by the host control 
unit. For example, let R_out1, R_out2, R_out3, and 
R_out4 be the targeted bit rates for four processed video 
bit streams having F_out1 , F_out2, F__out3, and F_out4 
as the targeted frame rates of these video bit streams. 
Note that these bit streams are in the form of four proc- 
essed QCIF video bit streams. If the first operational 
mode described above is selected, then at the initializa- 
tion stage indicated at block 401, rate control unit 141 
performs the following steps: 

(1 ) Specifying the targeted bit rates f orthe four proc- 
essed video bit streams, wherein R_out1 = R_out2 
= R_out3 = R_out4 = R/4; 

(2) According to R/4, determining the maximum out- 
put frame rates, F_out1, F_out2, F_out3, and 
F_out4. Here F_put1 = F_out2 = F_out3 = F_out4 
= F_out; 

(3) Sending F_out1 , F_out2, F_out3, and F_out4 to 
the corresponding endpoint devices via signal line 
142 to force these endpoint devices to operate with 
the specified maximum frame rates; 

(4) Computing the average bits_per_QCIF frame. 
Use the equation: average bits _per_QCIF 



(5) Initializing the four output buffers with initial 
buffer fullness (memory location occupancy rate) of 
B_0/4; 

(6) Specifying the targeted bits per_QCIF frame for 
the 1st frame of the video inputs: targeted-bits-per 
-GCIF=K*^t^3^ where K is a constant which is 
chosen basedbn the maximum frame rate and the 
initial buffer fullness B_0/4; 

(7) Calculating the targeted number of bits per 
macro block: targeted bits_per_mblk = (targeted 
bits_pe r_f rame) (total_number_of_mbl k); and 

(8) According to the targeted bits_per_mblk, spec- 
ifying control signals at signal lines 143, 144, 145, 
and 146, respectively. 

At macro block processing stage 402, the DCT co- 
efficient processors 1 21 , 1 22, 1 23, and 1 24 perform the 
following steps: 

(1) Obtaining a macro block from first outputs 113, 
115, 117, and 119 (FIG. 1), respectively; 

(2) Obtaining the control signals on buffer outputs 
143, 144, 145, and 146 from the rate control unit 
141; 

(3) Processing DCT coefficients in accordance with 
the control signals obtained in step (2); (Note that, 
if the DCT zeroing method is used in DCT coeffi- 



cients processor P, then control signals will be the 
coefficients size or else, if the requantization 
scheme is used, the control signal will be the quan- 
tification parameter. 

5 

Next, at block 403, after the processing of one mac- 
ro block has been completed, rate control unit 1 41 gets 
the new status for each buffer by obtaining the control 
signals on buff er outputs 143, 144, 145, and 146. Based 
10 upon these outputs, the rate control unit 141 updates 
the control signals. The steps for performing this update 
include: 

(1 ) Obtaining the total number of bits used in a given 
is macro block from each buffer, which may be spec- 
ified as bits _per_mblk; 

(2) Computing the difference between the targeted 
number of bits_per_mblk and the actual 
bits_per_mblk for each of the DCT processors; 

20 bits_difference += targeted bits_per_mblk - 
bits_per_mblk; 

(3) Updating the control signals at rate control unit 
1 41 , of signal lines 1 43, 1 44, 1 45, 1 46, at buffer out- 
puts 143, 144, 145, and 146, based on the following: 

25 

If difference > 0, adjusts the control signal to allow 
the corresponding DCT processor to process DCT co- 
efficients in such a way that more bits will be produced 
at the buffer output; 

30 

else if difference < 0, adjust the control signal to 
allow the corresponding DCT processing unit to 
process DCT coefficients in a way that less bits will 
be produced at the buffer output; 
35 else no change in control signals. 

At the end of processing each macro block, the 
macro block counter is checked against the total number 
of macro blocks to ascertain whether or not a frame is 
40 finished. If a frame is finished, rate control unit 1 41 starts 
updating the frame parameters. At block 405, the rate 
control unit 141 performs the following steps: 

(1) Obtaining the buffer status for each buffer; 
45 (2) Obtaining the total number of bits used by each 
QCIF frame; (Note that the number of bits used by 
the composite CIF frame will be the sum total of the 
number of bits used by each of the four QCIF 
frames); 

so (3) Based on the targeted buffer fullness, computing 
the targeted number of bits for the next composite 
CIF frame; (the bits for each QCIF frame is equal to 
the targeted number of bits for the next composite 
CIF frame divided by four), and the bits per macro 
55 block in each QCIF frame will be: 

bits _per_mac rob loc k = bits_per_CIF frame/ 
(4*totaLnumber_of_mblk); and 

(4) Based on the targeted number of bits for each 
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macro block, determining the characteristics of the 
control signals for the first macro block of the next 
frame. 

After frame parameter updating, the DCT proces- 
sors are ready to process the next frame. If there are no 
more bits in the receiving buffers, then the procedure of 
FIG. 4 ends. Otherwise, the rate control unit 141 obtains 
a signal specifying an operational mode from signal line 
142. If the operational mode has been changed from the 
mode previously specified, then the entire procedure 
described in FIG. 4 recommences at block 401 ; other- 
wise, the program reverts back to macro block process- 
ing at block 402. 

In the case where the second operational mode re- 
ferred to above is specified, rate control unit 141 per- 
forms different procedures than those described in con- 
nection with the first operational mode. These different 
procedures will be described below. For purposes of il- 
lustration, assume that input signal 101 carries the full- 
motion video of a conference speaker, and input signals 
102, 103, and 104 each carry video information corre- 
sponding to still images. Then, with reference to the pro- 
cedures described in FIG. 4, rate control unit 141 per- 
forms the following steps at the initialization stage of 
block 406: 

(1) Based on Ft/4, specifying the maximum frame 
rate F_out2, F_out3, and F_out4; 

F out2 ~ F oul3 " F out4» etC " = F out» 

(2) Based on R, specifying the maximum frame rate, 
F_out1; 

(3) Sending the frame rate F^, F^, F out3 , and 
F out4 to the corresponding endpoint devices via a 
signal on signal line 142 to force these endpoint 
devices to operate using the specified maximum 
frame rates; 

(4) Initializing the buffer 1 33 with the initial buffer 
fullness Bj; 

(5) Computing the targeted bits of the first frame for 
the processed input signals 102, 103, and 104; 

targeted bits_per_frame = K*(-zte — ) 

4 h out2 

where K is a constant which is chosen based 
on the maximum frame rate F_out; then the tar- 
geted bits__per_mblk is equation targeted 
bits_per_mblk = targeted bits_per_frame 
total_n umber_of_mblk; 

(6) According to the targeted bits_per_mblk, spec- 
ifying the characteristics of control signals 1 44, 1 45, 
and 146; 

(7) Computing the targeted bits of the 1 st frame for 
the processed input 101: 

R 

targeted bits_per_frame = K*(— — ; 

h out1 

where K is a constant which is chosen based on the 
maximum frame rate F_out1 and the initial buffer 



fullness B 1 . Then the targeted bits_per_mblk is 

targeted bits_per_mblk = (targeted 
bits_per_frame)(total_number_of_mblk); 
(8) According to the targeted bits_per_mblk, spec- 
5 rfying the characteristics of control signal 143. 

After the macro block processing stage, the DCT 
coefficient processors perform the procedures set forth 
in block 407 or, alternatively, the procedures set forth in 

10 block 408. The selection of whether to perform the pro- 
cedures of block 407 or the procedures of block 408 de- 
pends upon whether the current frame is the first frame. 
If so, block 407 is performed; if not, block 408 is per- 
formed. The procedure of block 407 consists of perform- 

1$ ing the steps set forth in blocks 402, 403, and 404. After 
finishing the first frame, rate control unit 141 starts up- 
dating the frame parameter at block 409. Rate control 
unit 141 performs the following steps at block 409: 

20 (1 ) Obtaining the buffer status at buffer output 1 37; 
(2) Obtaining the total bits used by each QCIF frame 
(note that the total number of bits used by the com- 
posite CIF frame will be the sum of the number of 
bits used by the four QCIF frames); 

25 (3) Based on the buffer status at buffer output 1 37 
and the number of bits used by the first composite 
CIF frame, computing the targeted number of bits 
for the next composite CIF frame; 

(4) Allocating the number of bits to be used by each 
30 QCIFframe; here the bits for the QCIF for input sig- 
nal 101 is: 

targeted bits_perjrame = (targeted 
bits_next_CI F)-(3*GOB_header_bits); 

then, the targeted bits for each macro block 
35 in 1 01 QCIF frame is targeted bits_per_mblk = (tar- 
geted bits_per_frame)(total_number mblk); 

(5) Based on the targeted bits for each macro block, 
determining the characteristics of the control signal 
on signal line 143; 

40 (6) Setting the control signals on signal lines 144, 
145, and 146 to clear all the DCT coefficients 
obtained from outputs 115, 117, and 119. Set sig- 
nals on outputs 138, 139, and 140 to clear buffers 
134, 135, and 136. 

45 

Referring again to FIG. 4, when the current frame 
is not the first frame, the DCT coefficient processors ex- 
ecute the steps in block 408. At block 408, DCT process- 
ing unit 121 executes the steps previously enumerated 
50 at blocks 402, 403, and 404, and DCT processing units 
122, 123, and 124 perform the following steps: 

(1 ) obtaining the control signals 144, 145, and 146; 

(2) setting all the incoming DCT coefficients to zero; 
55 (3) cleaning buffers 1 34, 1 35, and 1 36 via 1 38, 1 39, 

and 140; 

(4) performing the steps previously enumerated at 
block 404. 
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After finishing one frame, rate control unit 141 starts 
updating the frame parameters at block 410. The steps 
in block 410 include: 

(1 ) Obtaining the buffer status at buffer output 1 37; 

(2) Obtaining the total number of bits used by the 
QCIF frame corresponding to input signal 101; 

(3) Calculating the number of bits used by the com- 
posite CI F frame, which is determined by adding the 
number of bits used by the QCIF frame correspond- 
ing to input signal 101, to the number of bits used 
by three GOB headers; 

(4) Based on the buffer status at buffer output 137 
and the targeted buffer fullness, computing the tar- 
geted bits for the next composite CIF frame; note 
that the targeted number of bits to be used in the 
QCIF corresponding to input signal 1 01 is given by: 

targeted bits_per_frame = targeted 
bits_next_CIF-3*GOB_header_bits; 

then the targeted bits for each macro block in 
101 QCIF frame is targeted bits_per_mblk = (tar- 
geted bits _per_flame) (total_n umber_of_mb I k); 

(5) Based on the targeted number of bits for each 
macro block, generate an appropriate signal for 
control signal 143. 

After frame parameter updating, the DCT proces- 
sors are ready for a new frame. If there are no data in 
the buffers, the procedure ends. Otherwise, the rate 
control unit 141 obtains the operational mode from sig- 
nal line 142. If the operational mode is unchanged, re- 
start the whole procedure from block 401 ; otherwise, go 
back to block 411. 

Synchronization and multiplex unit 1 47 accepts four 
processed input bit streams from buffer outputs 133, 
1 34, 1 35, and 1 36. The synchronization and multiplexer 
unit 147 then synchronizes and multiplexes these bit 
streams to form a new composite video bit stream 150. 
The detailed block diagram and the corresponding rela- 
tionship between the buffer outputs 133, 134, 135, and 
136 and the output 150 of multiplexer are depicted in 
FIG. 5. 

Referring to FIG. 5, the synchronization and multi- 
plexer unit 147 consists of a switcher 500, a multiplexer 
processor 501 , and a buffer 502. A control signal on sig- 
nal line 148 controls the operation of switcher 500, such 
that the switcher switches to a first input buffer at a first 
moment in time, and switches to a second input buffer 
at a second moment in time. This signal line is coupled 
to an endpoint device which includes user interface 
means for entering the desired operational mode (the 
term "operational mode" was defined above). The mul- 
tiplexer processor 501 processes the input data based 
on the operational mode obtained via signal line 1 48 and 
then sends the processed data to the buffer 502. 

FIG. 6 is a flowchart setting forth a procedure exe- 
cuted by the synchronization and multiplexer unit 147 of 
FIG. 5. In FIG. 6, if the first operational mode (as defined 



above) is used, the multiplexer processor 501 (FIG. 5) 
processes the input data based on the steps set forth in 
block 601. If the second operational mode is used, the 
multiplexer processor 501 processes the input data 
5 based on the steps set forth in block 602. 

The steps performed at block 601 include: 

(1 ) Uploading the data of the first GOB from buffer 
output 133; 

10 (2) Downloading the data of step (1 ) to buffer 502 
(FIG. 5); 

(3) Uploading the data of the first GOB from buffer 
output 134; 

(4) Resetting GN=2; 

is (5) Downloading the data of Step (3) as modified by 
Step (4) to buffer 502; 

(6) Uploading the data of the second GOB from 
buffer output 133; 

(7) Downloading the data of step (6) to buffer 502; 
20 (8) Uploading the data of the second GOB from 

buffer output 1 34; 

(9) Resetting GN=4; 

(10) Downloading the data of steps (8) and (9) to 
buffer 502; 

2S (11) Uploading the data of the third GOB from buffer 
output 133; 

(1 2) Downloading the data of step 11 to buffer 502; 

(1 3) Uploading the data of the third GOB from buffer 
output 134; 

30 (14) Resetting GN=6; 

(15) Downloading the data of steps 1 3 & 1 4 to buffer 
502; 

(16) Uploading the data of the first GOB from buffer 
output 135; 

35 (17) Resetting GN=7; 

( 1 8) Downloading the data of steps 1 6 & 1 7 to buffer 
502; 

(19) Uploading the data of the first GOB from buffer 
output 136; 

40 (20) Resetting GN=8; 

(21 ) Downloading the data of steps 1 9 & 20 to buffer 
502; 

(22) Uploading the data of the second GOB from 
buffer output 135; 

45 (23) Resetting GN=9; 

(24) Downloading the data of steps 22 & 23 to buffer 
502; 

(25) Uploading the data of the second GOB from 
buffer output 136; 

so (26) Resetting GN=10; 

(27) Downloading the data of steps 25 & 26 to buffer 
502; 

(28) Uploading the data of the third GOB from buffer 
output 135; 

55 (29) Resetting GN=11; 

(30) Downloading the data of steps 28 & 29 to buffer 
502; 

(31 ) Uploading the data oft he third GOB from buffer 
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output 136; 

(32) Resetting GN=12; 

(33) Downloading the data of steps 31 & 32 to buffer 
502. 

5 

After block 601 is executed, the program progress- 
es to block 602. The steps in block 602 include: 

(1) If the input data represents the first frame of 
video, execute the procedure of block 601; other- 10 
wise execute the following: 

(2) Upload the data of the first GOB from buffer out- 
put 133 and download the data to buffer 502; 

(3) Generate the GOB header with GN-2, and 
download the data to buffer 502; is 

(4) Upload the data of the second GOB from buffer 
output 1 33 and download the data to buffer 502; 

(5) Generate the GOB header with GIM=4, and 
download the data to buffer 502; 

(6) Upload the data of the third GOB from buffer out- 20 
put 1 33 and download the data to buffer 502; 

(7) Generate GOB headers with GN=6, 7, 8, 9, 10, 
11 , and 12, respectively, and download the data to 
buffer 502. 

25 

FIG. 7 is a hardware block diagram setting forth an 
alternative embodiment of a video composition system. 
The system provides first, second, third, and fourth in- 
puts 701 , 702, 703, and 704 which accept input signals 
in the form of coded video bit streams, similar to the input 30 
signals described in conjunction with FIG. 1 . The output 
signal at output 710 is a coded video bit stream having 
a bit rate of R kbits/s. Signals at first, second, third, and 
fourth inputs 701 , 702, 703, 704 are buffered using re- 
spective buffers 705, 706, 707, and 708, and then fed 35 
toa synchronization and multiplexer unit 709. At the syn- 
chronization and multiplexer unit 709, the signals at the 
first, second, third, and fourth inputs 701 , 702, 703, and 
704 are combined into one output signal at output 710. 
The manner in which these signals are combined is de- 40 
termined by an operational mode signal on signal line 
723. This signal specifies a desired operational mode 
for the synchronization and multiplexer unit 709. For ex- 
ample, this signal may specify an operational mode 
wherein the four inputs 701 , 702, 703, 704 are combined 45 
in equal portions to form one output signal at output 710. 
The four inputs 701 , 702, 703, 704 are each in a QCIF 
format. The signal at output 710 is in a CI F format which 
includes a composite of the four QCIF inputs with trans- 
mission rate of R1 , + R2, + R3 + R4 kbits/s. To match 
the output transmission rate, which is R kbits/s, the sig- 
nal at output 710 is sent to a video transmission rate 
reduction system 740. The video transmission rate re- 
duction system 740 includes a Video Multiplex Decoder 
(VMD) 711, a DCT coefficient processor 714, a Video 
Multiplex Encoder (VME) 716, a transmission buffer 
718, and a rate control unit 720. The detailed function- 
ality and the operation of the video transmission rate re- 



duction system 740 is disclosed in the previously-cited 
patent application filed on the same date as the present 
patent application by the identically-named inventors 
and document entitled, "Video Transmission Rate 
Matching for Multimedia Communications Systems". 

The synchronization and multiplexer unit 709 is vir- 
tually identical to that described in conjunction with ref- 
erence numeral 147 of FIG. 1, with the exception that 
control signal 148 is replaced by data uploaded from 
buffers 705, 706, 707, and 708. 

The video composition systems shown in FIGs. 1 
and 7 can be implemented, for example, by using a gen- 
eral-purpose microprocessor, a digital signal processor 
(such as an AT&T DSP 3210 or an AT&T DSP 1610), 
and/or a programmable video processing chip (such as 
an integrated circuit known to those skilled in the art as 
the ITT VCP chip). 

Multimedia System U9lng Video Processing of the 
Present Invention 

To illustrate various typical applications for the 
present invention in the context of multimedia confer- 
encing, FIG. 6 shows a multimedia system using a video 
processor embodying the coded domain video compo- 
sition techniques disclosed herein. Referring now to 
FIG. 8, a block diagram setting forth the system archi- 
tecture of a multimedia conferencing system 800 is 
shown. The conferencing system includes an MCU 81 0, 
an ISDN network 804, and a plurality of endpoint devic- 
es such as first endpoint device 801 , second endpoint 
device 802, and third endpoint device 803. 

Endpoint devices 801 , 802, and 803 are coupled to 
MCU 810 via ISDN network 804. These endpoint devic- 
es 801 , 802, and 803 may include one or more user in- 
terface devices. Each interface device includes either 
an input means, an output means, or an input means 
combined with an output means. Output means are 
adapted to convert multimedia electronic signals repre- 
senting audio, video, or data into actual audio, video, or 
data. Input means are adapted to accept audio, video, 
and/or data inputs, and to convert these inputs into elec- 
tronic signals representing audio, video, and/or data. 
Examples of user interface devices include video dis- 
play, keyboards, microphones, speakers, and video 
cameras, or the like. 

Endpoint devices 801 , 802, and 803 are adapted to 
communicate using existing multimedia communication 
protocols such as ISDN. The endpoint device multime- 
dia communication protocol controls the presentation of 
media streams (electronic signals representing audio, 
video, and/or data information) to the endpoint device 
user. Endpoint devices may function bi-directionally, 
both sending and receiving multimedia information, or, 
alternatively, endpoint devices may function uni-direc- 
tional, receiving but not sending multimedia information, 
or sending but not receiving multimedia information. 

An example of a suitable endpoint device is an ITU- 
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T H.320 audiovisual terminal, but any device capable of 
terminating a digital multimedia stream and presenting 
it to the user constitutes an endpoint device. A particular 
product example of an H. 320-compatible endpoint is the 
AT&T-GIS Vlstium. 

MCU 81 0 is a computer-controlled device which in- 
cludes a multiplicity of communications ports, such as 
first communications port 870 and second communica- 
tions port 872, which may be selectively interconnected 
in a variety of ways to provide communication among a 
group of endpoint devices 801 , 802, 803. Although the 
system of FIG. 8 shows two communications ports, this 
is done for illustrative purposes, as any convenient 
number of communications ports may be employed. 
MCU 810 also includes a control processor 840, an au- 
dio processor 841 , a video processor 842, a data proc- 
essor 843, and a common internal switch 819. Each 
communications port includes a network interface, a de- 
multiplexer, and a multiplexer. For example, first com- 
munications port 870 includes network interface 811, 
demultiplexer 813, and multiplexer 822. 

Although MCU 810 is shown with two communica- 
tions ports 870, 872 for purposes of illustration, MCU 
81 0 may, in fact, include any convenient number of com- 
munications ports. For an MCU 810 having N ports, 
there are N network interfaces, one control processor, 
one audio processor, one video processor, and one data 
processor For each processor, there are N input signals 
coming from N demultiplexers and N output signals go- 
ing to the N multiplexers. Therefore, MCU 810 may be 
conceptualized as an N-port MCU where only two com- 
munications ports 870, 872 are explicitly shown. 

As shown in FIG. 8, first communications port 870 
includes network interface 811, demultiplexer 813, and 
multiplexer 81 2. Network interface 81 1 is a circuit which 
provides the conversion function between the standard 
line signal coding used by ISDN network 804 and the 
Px64 kbps H.221 signal used by MCU 810. Network in- 
terface 811 includes output port 812, which is adapted 
to provide an output in the form of an H.221 signal. The 
H.221 signal is actually a multiplex of several different 
types of information ( audio, video, data, control); there- 
fore, network interface 811 must send the incoming 
MCU H.221 signal to a demultiplexing device such as 
demultiplexer 81 3. Likewise, network interface 811 has 
an input port 823 adapted to receive an H.221 signal 
from multiplexer 822. Multiplexer 822 combines a plu- 
rality of individually-processed signals which are to be 
transmitted to a particular endpoint device. 

Demultiplexer 81 3 separates an incoming multime- 
dia signal stream received from network interface 811 
into four components: a first component 81 4, comprising 
electronic signals representing control; a second com- 
ponent 815, comprising electronic signals representing 
audio; a third component 816, comprising electronic sig- 
nals representing video; and a fourth component 817, 
representing data. The first, second, third, and fourth 
components 81 4, 81 5, 816, 81 7 represent outputs of de- 



multiplexer 813 which are coupled to common internal 
switch 819. 

Multiplexer 822 accepts a plurality of incoming mul- 
timedia signal components from common internal 
s switch 81 9, such as a first component 81 8 representing 
control, a second component 891 representing audio, a 
third component 820 representing video, and a fourth 
component 821 representing data. The multiplexer 822 
integrates the first, second, third, and fourth compo- 
nents 818, 891 , 820, 821 onto a single multimedia signal 
stream which is coupled to network interface 811 . This 
single multimedia signal stream may be conceptualized 
as the output of multiplexer 822. The network interface 
811 routes this multimedia signal stream to a specific 
endpoint device 801, 802, 803. For second communica- 
tions port 872, the four output components are first com- 
ponent 824, representing control, second component 
825, representing audio, third component 826, repre- 
senting video, and fourth component 827, representing 
data. The four input components to multiplexer in sec- 
ond communications port 872 are first component 828, 
representing control, second component 829, repre- 
senting audio, third component 830, representing video, 
and fourth component 831, representing data. 

Common internal switch 81 9 contains a plurality of 
electronic switches, buffers, and/or amplifiers under the 
control of control processor 840. Common internal 
switch 81 9 is coupled to audio processor 841 for mixing 
and switching electronic signals representing audio; 
common internal switch 81 9 is also coupled to video 
processor 842 and data processor 843 for mixing and 
switching electronic signals representing video and da- 
ta, respectively. Therefore, common internal switch 81 9 
effectively receives four output components from each 
communications port 870, 872 and routes these output 
components to selected ones of respective processors 
(control processor 840, audio processor 841, video 
processor 842, and/or data processor 843) within MCU 
810. Likewise, common internal switch 81 9 receives the 
output components of each processor in MCU 810 and 
routes these outputs to the multiplexer 822 of each com- 
munications port 870. 

Common internal switch 819 receives output con- 
trol signals from control processor 840 over signal line 
851 , and provides input control signals to control proc- 
essor 840 over signal line 850. Common internal switch 
81 9 receives output audio signals from audio processor 
841 over signal line 853, and provides input audio sig- 
nals to audio processor 841 over signal line 852. Com- 
mon internal switch 81 9 receives output video signals 
from video processor 842 over signal line 855, and pro- 
vides input video signals to video processor 842 over 
signal line 854. Common internal switch 81 9 receives 
output data signals from data processor 843 over signal 
line 857, and provides input data signals to data proc- 
essor 843 over signal line 856. Control processor 840 
provides control signals to the audio processor 841 , vid- 
eo processor 842, and data processor 843 over signal 
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line 844. 

ISDN network 804 is connected to MCU 810 over 
signal line 805. Within MCU 810, signal line 805 is par- 
allel-connected to first and second communications 
ports 870, 872. For example, in the case of first commu- 5 
nicatbns port 870, signal line 805 is connected to net- 
work interface 811 . Network interface 811 is coupled to 
demultiplexer 81 3 over signal line 81 2, and this network 
interface 8 1 1 is also coupled to multiplexer 822 over sig- 
nal line 823. Signal line 812 is coupled to the input ter- 
minal of demultiplexer 813, and signal line 823 is cou- 
pled to the output terminal of multiplexer 822. 

Audio processor 841 includes software and hard- 
ware for processing audio signals. The processing may 
take the form of switching the audio, mixing the audio, 
or both. In the case of audio mixing, the input signal to 
audio processor 841 is an aggregate audio signal con- 
sisting of the audio output signals from all of the com- 
munications ports 870, 872 of MCU 810. For an N-port 
MCU 810, this signal includes the N audio signals from 
the demultiplexers within each communications port 
870, 872. 

To mix the audio, audio processor 841 decodes 
each of the audio inputs, linearly adds the signals ob- 
tained by decoding, and then re-encodes the linear sum. 
For each endpoint device, this linear sum may be sub- 
jected to additional processing steps, so as to provide 
each endpoint device with audio information specific to 
that endpoint device. These additional processing steps 
may include, for example, any of the following: the out- 
put sum for a given endpoint device may exclude that 
endpoint's input; the sum may include inputs whose 
present or recent past values exceed a certain thresh- 
old; or the sum may be controlled from a specially-des- 
ignated endpoint device used by a person termed the 
"chair", thereby providing a feature generally known as 
chair-control. Therefore, the output of the audio proces- 
sor 841 is in the form ofN processed audio signals. 

In the case of audio switching, the input signal to 
audio processor 841 is a single audio signal which is 
selected from a given communications port 870 or 872, 
based upon control signals received from control proc- 
essor 840. No audio processing is implemented in the 
present example which involves only audio switching. 
The audio input is broadcast to all other audio processor 
841 outputs, either automatically or under manual con- 
trol. 

Data processor 843 includes hardware and soft- 
ware means for implementing one or both of the func- 
tions generally known to those skilled in the art as 
"broadcast" or "MLP\ For each type of broadcast data, 
data input is accepted from only one endpoint device at 
any one time. Therefore, the input signal to data proc- 
essor 843 is the data output from one of the communi- 
cations ports 870, 872. This data output is broadcast to 
the other endpoint devices as determined by control 
processor 840, according to the capabilities of specific 
endpoint devices to receive such data, as set forth in the 



capability codes stored in memory units (RAM or ROM) 
of respective endpoint devices. For the endpoints which 
are selected for picture composition, control processing 
unit 840 modifies their capability codes by specifying a 
new maximum frame rate based on the output transmis- 
sion rate of the communication link, and send the new 
capability codes to the four selected endpoints so they 
can produce video bitstreams with proper maximum 
frame rate. If there are no special requirements received 
from any of the endpoint devices, the control processor 
840 sets the operation mode to mode 1 (the first mode). 

Control processor 840 is responsible for determin- 
ing the correct routing, mixing, switching, format and 
timing of the audio, video, data and control signals 
throughout a multimedia conference. The control proc- 
essor 840 retrieves one or more capability codes from 
each endpoint device. Capability codes, which, are 
stored in endpoint device RAM and/or ROM, specify the 
audio, video, data, and/or control capabilities for this 
endpoint device. Control processor 840 retrieves the ca- 
pability codes from all N endpoint devices participating 
in a multimedia conference. These capability codes are 
stored in a memory unit (RAM) of MCU 81 0 so that con- 
trol processor 840 can correctly manage the conference 
for ail endpoint devices. This storage may occur, for ex- 
ample, in a random-access memory (RAM) device as- 
sociated with control processor 840. In turn, MCU 810 
sends the capability codes to each of the N communi- 
cations ports 870, 872 so that each of the endpoint de- 
vices 801 , 802, 803 are enabled to communicate with 
MCU 810 at a bit rate determined by MCU 810 and ap- 
propriate for that specific endpoint device 801 , 802, 803. 

Control processor 840 receives inputs which are 
entered by conference participants into the user inter- 
face of an endpoint device 801 , 802, 803. These inputs 
are in the form of chair-control commands and com- 
mands embedded in bit streams conforming to the H. 
221 standard. Commands from endpoint devices are 
routed to the control processor 840 to ensu re the correct 
distribution of bit streams to the audio, video, and data 
processors B41 , 842, 843, respectively, to ensure that 
the correct audio decoding algorithm is used at the in- 
puts to an audio mixer within audio processor 841 , and 
to ensure that any incoming data is sent to a data broad- 
cast unit or MLP processor within data processor 843. 

The control processor 840 also directs the switching 
of the bit streams from the audio, video, and data proc- 
essors 841 , 842, 843, respectively, to each multiplexer 
622, 834, and specifies the audio encoding algorithm 
used in the audio mixer of audio processor 841 , and the 
algorithm used at each output from the audio mixer. The 
bit streams are routed to and from the various proces- 
sors 841 , 842, 843 by the common internal switch 81 9, 
which is under control of the control processor 840. 

Video processor 842, which embodies the picture 
composition techniques of the present invention, proc- 
esses the video signals received from the common in- 
ternal switch 819. The processing may take the form of 
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2. A method for composing a video image as set forth 
in claim 1 wherein step (b) further includes the step 
of providing a composite video signal wherein each 
rectangular region includes video information from 

5 a corresponding one of the plurality of video 
sources. 

3. A method for composing a video image as set forth 
in claim 1 

10 

characterized in that electronic signals are 
received from each of a first, a second, a third 
and a fourth video source; the composite video 
image being partitioned into a first, a second, a 
is third, and a fourth rectangular region; 

step (b) further characterized by the steps of 
placing information from the first video source 
into the first rectangular region; placing infor- 
mation from the second video source into the 
20 second rectangular region; placing information 

from the third video source into the third rectan- 
gular region; and placing information from the 
fourth video source into the 4th rectangular 
region. 

25 

4. A method for composing a video image as set forth 
in claim 1 characterized in that the electronic signals 
are each in the form of a digital bit stream; step (b) 
further characterized by the step of matching the bit 

30 rates of each of the electronic signals to an arbitrar- 
ily selected bit rate. 

5. A method for composing a video image as set forth 



switching the video, mixing the video, or both. In video 
switching, the video processor 842 receives one select- 
ed video signal from the switch 819, and transmits the 
video signal to some or all other endpoint devices par- 
ticipating in a given multimedia conference. Video se- 
lection may be automatic or under manual control. For 
instance, the audio processor 841 and the video proc- 
essor 842 may be automatically controlled by control 
processor 840, such that an endpoint device with cur- 
rently active audio (i.e., an endpoint device used by the 
"present speaker" which provides an audio signal to 
MCU 810 above a predetermined audio amplitude 
threshold) receives the picture of the endpoint device 
which previously had active audio (i.e., an endpoint de- 
vice used by the "previous speaker"), while all other end- 
point devices receive the picture ofthe present speaker. 

A time delay may be incorporated into the video 
switching implemented by video processor 842 to avoid 
excessively frequent video image changes caused by 
spurious sounds. As in the case of audio switching, vid- 
eo switching may be controlled directly from a specially- 
designated endpoint device used by a person termed 
the "chair". If the delay in the video processor 842 and 
the delay in the audio processor 841 differ by a signifi- 
cant (humanly perceptible) amount, a compensating de- 
lay may be inserted into the appropriate bit stream to 
retain lip synchronization. 

In video mixing, the video processor 842 receives 
four selected video bit streams from the switcher 819, 
and composites the four bitstreams into one video bit- 
stream by using the picture composition system 100 of 
FIG. 1 , or the system 700 of FIG. 7, which is embedded 
in the video processing unit. The composite bitstream 
855 is fed to the common internal switch 81 9. Through 
the switch 81 9, the composite signal is switched to the 35 
proper endpoint devices via their corresponding com- 
munication ports under the control of the control 
processing unit 840. As in video-switching, the video se- 
lection may be automatic or under manual control. 



Claims 

1 . A method for composing a video image from a plu- 
rality of video sou rces and characterized by the fol- 45 
lowing steps: 

a) receiving electronic signals representing 
video information from each of a plurality of 
video sources; and so 

b) combining the electronic signals from each 
of a plurality of video sources into one compos- 
ite video signal; the composite video signal rep- 
resenting a video image having a plurality of 
rectangular regions, at least one rectangular ss 
regbn including video information from one of 
the plurality of video sources. 



in claim 1 characterized in that the electronic signals 
are each in the form of a digital bit stream; the elec- 
tronic signals characterized by a first electronic sig- 
nal having a first bit rate, and a second electronic 
signal having a second bit rate faster than the first 
bit rate, step (b) further characterized by the step of 
matching the bit rates of each of the electronic sig- 
nals to the first bit rate. 
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